34 research outputs found

    Improving peak picking using multiple time-step loss functions

    Get PDF
    The majority of state-of-the-art methods for music infor-mation retrieval (MIR) tasks now utilise deep learningmethods reliant on minimisation of loss functions such ascross entropy. For tasks that include framewise binaryclassification (e.g., onset detection, music transcription)classes are derived from output activation functions byidentifying points of local maxima, or peaks. However, theoperating principles behind peak picking are different tothat of the cross entropy loss function, which minimises theabsolute difference between the output and target valuesfor a single frame. To generate activation functions moresuited to peak-picking, we propose two versions of a newloss function that incorporates information from multipletime-steps: 1)multi-individual, which uses multiple indi-vidual time-step cross entropies; and 2)multi-difference,which directly compares the difference between sequentialtime-step outputs. We evaluate the newly proposed lossfunctions alongside standard cross entropy in the popularMIR tasks of onset detection and automatic drum tran-scription. The results highlight the effectiveness of theseloss functions in the improvement of overall system ac-curacies for both MIR tasks. Additionally, directly com-paring the output from sequential time-steps in the multi-difference approach achieves the highest performance

    Audio style transfer with rhythmic constraints

    Get PDF
    In this transformation we present a rhythmically constrained audio style transfer technique for automatic mixing and mashing of two audio inputs. In this transformation the rhythmic and timbral features of both input signals are combined together through the use of an audio style transfer process that transforms the files so that they adhere to a larger metrical structure of the chosen input. This is accomplished by finding beat boundaries of both inputs and performing the transformation on beat-length audio segments. In order for the system to perform a mashup between two signals, we reformulate the previously used audio style transfer loss terms into three loss functions and enable them to be independent of the input. We measure and compare rhythmic similarities of the transformed and input audio signals using their rhythmic envelopes to investigate the influence of the tested transformation objectives

    Player vs Transcriber: A Game approach to automatic music transcription

    Get PDF
    State-of-the-art automatic drum transcription (ADT) ap-proaches utilise deep learning methods reliant on time-consuming manual annotations and require congruence be-tween training and testing data. When these conditionsare not held, they often fail to generalise. We proposea game approach to ADT, termed player vs transcriber(PvT), in which a player model aims to reduce transcrip-tion accuracy of a transcriber model by manipulating train-ing data in two ways. First, existing data may be aug-mented, allowing the transcriber to be trained using record-ings with modified timbres. Second, additional individualrecordings from sample libraries are included to generaterare combinations. We present three versions of the PvTmodel:AugExist, which augments pre-existing record-ings;AugAddExist, which adds additional samples ofdrum hits to theAugExistsystem; andGenerate, whichgenerates training examples exclusively from individualdrum hits from sample libraries. The three versions areevaluated alongside a state-of-the-art deep learning ADTsystem using two evaluation strategies. The results demon-strate that including the player network improves the ADTperformance and suggests that this is due to improved gen-eralisability. The results also indicate that although theGeneratemodel achieves relatively low results, it is a vi-able choice when annotations are not accessible

    Trainable data manipulation with unobserved instruments

    Get PDF
    Machine learning algorithms are the core components in a wide range of intelligent music production systems. As training data for these tasks is relatively sparse, data augmentation is often used to generate additional training data by slightly altering existing training data. User-defined techniques require a long parameter tuning process and typically use a single set of global variables. To address this, a trainable data manipulation system, termed player vs transcriber, was proposed for the task of automatic drum transcription. This paper expands the player vs transcriber model by allowing unobserved instruments to also be manipulated within the data augmentation and sample addition stages. Results from two evaluations demonstrate that this improves performance and suggests that trainable data manipulation could benefit additional intelligent music production tasks

    AUTOMATIC DRUM TRANSCRIPTION USING BI-DIRECTIONAL RECURRENT NEURAL NETWORKS

    Get PDF
    ABSTRACT Automatic drum transcription (ADT) systems attempt to generate a symbolic music notation for percussive instruments in audio recordings. Neural networks have already been shown to perform well in fields related to ADT such as source separation and onset detection due to their utilisation of time-series data in classification. We propose the use of neural networks for ADT in order to exploit their ability to capture a complex configuration of features associated with individual or combined drum classes. In this paper we present a bi-directional recurrent neural network for offline detection of percussive onsets from specified drum classes and a recurrent neural network suitable for online operation. In both systems, a separate network is trained to identify onsets for each drum class under observation-that is, kick drum, snare drum, hi-hats, and combinations thereof. We perform four evaluations utilising the IDMT-SMT-Drums and ENST minus one datasets, which cover solo percussion and polyphonic audio respectively. The results demonstrate the effectiveness of the presented methods for solo percussion and a capacity for identifying snare drums, which are historically the most difficult drum class to detect

    Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks

    Get PDF
    Automatic drum transcription is the process of generating symbolic notation for percussion instruments within audio recordings. To date, recurrent neural network (RNN) systems have achieved the highest evaluation accuracies for both drum solo and polyphonic recordings, however the accuracies within a polyphonic context still remain relatively low. To improve accuracy for polyphonic recordings, we present two approaches to the ADT problem: First, to capture the dynamism of features in multiple time-step hidden layers, we propose the use of soft attention mechanisms (SA) and an alternative RNN configuration containing additional peripheral connections (PC). Second, to capture these same trends at the input level, we propose the use of a convolutional neural network (CNN), which uses a larger set of time-step features. In addition, we propose the use of a bidirectional recurrent neural network (BRNN) in the peak-picking stage. The proposed systems are evaluated along with two state-of-the-art ADT systems in five evaluation scenarios, including a newly-proposed evaluation methodology designed to assess the generalisability of ADT systems. The results indicate that all of the newly proposed systems achieve higher accuracies than the stateof- the-art RNN systems for polyphonic recordings and that the additional BRNN peak-picking stage offers slight improvement in certain contexts

    Improved onset detection for traditional flute recordings using convolutional neural networks

    Get PDF
    The usage of ornaments is key attribute that defines the style of a flute performances within the genre of Irish Traditional Music (ITM). Automated analysis of ornaments in ITM would allow for the musicological investigation of a player’s style and would be a useful feature in the analysis of trends within large corpora of ITM music. As ornament onsets are short and subtle variations within an analysed signal, they are substantially more difficult to detect than longer notes. This paper addresses the topic of onset detection for notes, ornaments and breaths in ITM. We propose a new onset detection method based on a convolutional neural network (CNN) trained solely on flute recordings of ITM. The presented method is evaluated alongside a state-of-the-art gen eralised onset detection method using a corpus of 79 full-length solo flute recordings. The results demonstrate that the proposed system outperforms the generalised system over a range of musi cal patterns idiomatic of the genre
    corecore